technovangelist / scripts / using embeddings

using embeddings

What I Say What shows What I type
Ollama 0.1.22 just came out. Hot on the heels of 0.1.21. Two releases in a week. Most of the new features were in the first one, but 22 cleans things up a bit. We will take a look at all the new features and I am especially excited about one of them that can make a big impact for all users. me talking
First we see a bunch of new models. Qwen, DuckDB NoSql, Stable Code, Nous Hermes 2 Mixtral, and Stable LM2 scroll down on ollama.ai by newest to highlight models
Then we got the announcement of the new official Typescript and JavaScript library as well as the new official Python library, making it even easier to build great apps with Ollama. show github repo with the two libraries
Now lets talk about CPU me with overlay of trainor singing all about that base, overlay cpu or avx
From the beginning, Ollama required AVX instructions to be available on the CPU. me talking
AVX is all about double precision floating-point arithmetic math equations floating in front of me like that movie
Its for applications that do a lot of vector calculations…like, oh, i don’t know… OLLAMA? me with overlay of church lady… instead of satan say ollama
And it seemed like a no-brainer to include as a requirement because it’s been on almost every CPU for over a decade. me talking
Well, it turns out that there are a lot of extra machines out there with 15-year-old CPUs and their owners need them to still provide value but Ollama would just crash on them. the animation of pile of cpus
And there are some newer CPUs that support AVX2 that couldn’t take advantage of the improvements avx guide on intel website
Well, both situations are now supported.
There were also occasions when the GPU wasn’t recognized and Ollama would just crash. show an explosion
Well now, if not recognized, Ollama will fall back to using CPU only.
I think one of the common reasons this happened was you had an ancient Nvidia GPU that wasn’t supported by CUDA. You can get some super cheap GPUs on eBay, but sometimes the promise is too good to be true. show eBay with old nvidia cards
There is also better support for Nvidia GPUs in WSL
Now some folks were seeing problems with Ollama where it would hang after 20 or so requests.
I tried replicating this, going on for an hour and hundreds of requests and had no issues, but one of the team members figured it out and got it resolved.
There are a bunch of folks using Ollama in countries with, ummm… questionable access to the Internet.
Either it’s slow or connections drop or lots of noise on the line.
You know, places like the US but also Eastern European countries, certain Central and South American countries, and others.
Well, if that connection dropped for whatever reason during a pull or push, you would get a cryptic error. Those should be resolved.
So all of those are huge fixes if you were affected by them, but I would imagine 90% of users weren’t touched by them. But man, those 10% are certainly vocal. show a crowd with a few shouting
Let’s look at some settings that help everyone out. First is the messages directive in the Modelfile. show docs with messages in modelfile
Recently, the chat endpoint was added to the API. This made it super easy to add a few shot prompt to a model, which is great for providing examples of what you want. For instance, if you want to output JSON, it would help to provide the schema in the prompt and then things got better if you could include some examples. add the messages to the api call, showing examples
Well now you can provide those examples in the modelfile as well. You use MESSAGE and then the role, which looks like its limited to user and assistant, and then the question or answer. do the same in the modelfile
Pretty cool stuff.
In the beginning, Ollama just let you configure all the settings for a model in the modelfile. type a bunch of parameters in the modelfile
Over time, more of that config also happened in the Ollama repl with the /set commands. You can update the system prompt, or template, or any of the parameters there. the same params in /set command
But there wasn’t an easy way to serialize your new creation. Well, now there is also a /save command that lets you save what you have done to a new model which you can then load again later on. use the /save command.
And when you are done with a model, you can use /load and a model name, and you will get a new model loaded. use the /load command
So maybe you want to switch from llama2 to mistral. That’s super easy to do. ollama run llama2
why is the sky blue
/load mistral
You can also use it as a way to clear the context. So if you are in mistral, try /load mistral and it will forget the context from the previous conversation. ask series of questions, then /load mistral again
One more thing that I am excited about is that the /show parameter command will now output the correct setting. Often I would set temp to 0.9 or 1.2 but /show parameters would say that temp was 1. Now it will output the correct values. /set parameter temperature 0.9
/show parameters
And that is whats new in version 0.1.21 and 0.1.22 of Ollama. me
I think these have a lot for everyone and are going to be magic for a few.
What do you think of the messages in the modelfile feature? Is there something else that resonates with you in this new version?
If you find this useful, like and subscribe. Thanks so much for watching, goodbye.

Ollama 0.1.22 just came out. Hot on the heels of 0.1.21. Two releases in a week. Most of the new features were in the first one, but 22 cleans things up a bit.

First we see a bunch of new models. Qwen, DuckDB-NoSql, Stable Code, Nous Hermes 2 Mixtral, and Stable LM2

Then we got the announcement of the new official Typescript / JavaScript library as well as the new official Python library, making it even easier to build great apps with Ollama.

Now lets talk about CPU ((all about that base/ cpu/avx avx)). From the beginning, Ollama required AVX instructions to be available on the CPU. AVX is all about double precision floating-point arithmetic (( math equations in front of me)). Its for applications that do a lot of vector calculations…like, oh, i don’t know… Ollama ((church lady )). And it seemed like a no-brainer to include as a requirement because it’s been on almost every CPU for over a decade. ((cpu animation)) Well, it turns out that there are a lot of extra machines out there with 15-year-old CPUs that need to still provide value and Ollama would crash on them. And there are some newer CPUs that support AVX2 that couldn’t take advantage of the improvements((avx2 guide))
. Well, both situations are now supported.

There were also occasions when the GPU wasn’t recognized and Ollama would just crash. Well now, if not recognized, Ollama will fall back to using CPU only. I think one of the common reasons this happened was you had an ancient Nvidia GPU that wasn’t supported by CUDA. You can get some super cheap GPUs on eBay, but sometimes the promise is too good to be true.

There is also better support for Nvidia GPUs in WSL

Now some folks were seeing problems with Ollama where it would hang after 20 or so requests. I tried replicating this, going on for an hour and hundreds of requests and had no issues, but one of the others figured it out and got it resolved.

There are a bunch of folks using Ollama in countries with, ummm… questionable access to the Internet. Either it’s slow or connections drop or lots of noise on the line. You know, places like the US but also Eastern European countries, certain Central and South American countries, and others. Well, if that connection dropped for whatever reason during a pull or push, you would get a cryptic error. Those should be resolved.

So all of those are huge fixes if you were affected by them, but I would imagine 90% of users weren’t touched by them. But man, those 10% are certainly vocal.

Let’s look at some settings that help everyone out. First is the messages directive in the Modelfile. Recently, the chat endpoint was added to the API. This made it super easy to add a few shot prompt to a model, which is great for providing examples of what you want. For instance, if you want to output JSON, it would help to provide the schema in the prompt and then things got better if you could include some examples. Well now you can provide those examples in the modelfile as well. You use MESSAGE and then the role, which looks like its limited to user and assistant, and then the question or answer. Pretty cool stuff.

In the beginning, Ollama just let you configure all the settings for a model in the modelfile. Over time, more of that config also happened in the Ollama repl with the /set commands. You can update the system prompt, or template, or any of the parameters there. But there wasn’t an easy way to serialize your new creation. Well, now there is also a /save command that lets you save what you have done to a new model which you can then load again later on.

And when you are done with a model, you can use /load and a model name, and you will get a new model loaded. So maybe you want to switch from llama2 to mistral. That’s super easy to do. You can also use it as a way to clear the context. So if you are in llama2, try /load llama2 and it will forget the context from the previous conversation.

One more thing that I am excited about is that the /show parameter command will now output the correct setting. Often I would set temp to 0.9 or 1.2 but /show parameters would say that temp was 1. Now it will output the correct values.

And that is whats new in version 0.1.21 and 0.1.22 of Ollama. I think these have a lot for everyone and are going to be magic for a few. If you find this useful, like and subscribe. Thanks so much for watching, goodbye.